Skip to content

test: improve CLI test determinism and remove redundant test logic#1123

Merged
cv merged 7 commits intoNVIDIA:mainfrom
ksapru:fix/dead-code-cleanup-v2
Apr 1, 2026
Merged

test: improve CLI test determinism and remove redundant test logic#1123
cv merged 7 commits intoNVIDIA:mainfrom
ksapru:fix/dead-code-cleanup-v2

Conversation

@ksapru
Copy link
Copy Markdown
Contributor

@ksapru ksapru commented Mar 30, 2026

Summary

Improves test determinism, consistency, and reliability across CLI, uninstall, and blueprint test suites by standardizing shell invocation, tightening execution patterns, and removing redundant or outdated test code.


Related Issue

Fixes #977 (part 1)


Changes

  • Normalize shell invocation:
    • Replace bash -lc with bash -c in uninstall tests to avoid shell initialization side effects
  • Improve CLI test stability:
    • Increase timeouts for long-running commands
    • Standardize usage of runWithEnv(..., timeout)
  • Remove redundant / outdated test code:
    • Clean up unused or deprecated test logic in runner.test.ts
  • Improve test consistency:
    • Align execution patterns across CLI and uninstall tests
  • Preserve security coverage:
    • Maintain regression protections (e.g., path validation and credential handling)

Verification

  • npm test passes locally
  • npx prek run --all-files passes in CI
  • No changes to CLI behavior or runtime logic
  • Existing security and regression tests continue to pass

Rationale

Some tests relied on shell initialization behavior (bash -lc) and inconsistent execution patterns, leading to flakiness and non-deterministic outcomes.

These updates:

  • eliminate shell-dependent variability
  • standardize execution across test suites
  • improve reliability without impacting functionality

Additionally, minor cleanup removes redundant or outdated test code to improve maintainability.


Risk Assessment

Low risk

  • Changes are limited to test code and execution behavior
  • No production code paths modified
  • Security and regression coverage preserved

Rollback

  • Fully reversible by reverting test changes

Type of Change

  • Test / infrastructure improvement (no behavioral change)
  • Code cleanup / maintenance

Testing

  • npm test passes
  • npx prek run --all-files passes (CI)

Checklist

General

  • Contributing guide followed

Code Changes

  • Formatters applied
  • No user-facing behavior changes
  • No secrets committed

Summary by CodeRabbit (updated)

  • Tests
    • Improved CLI and uninstall test determinism by standardizing shell invocation
    • Increased timeouts to reduce flakiness in long-running test cases
    • Removed redundant or outdated test logic for improved maintainability

Summary by CodeRabbit

  • Tests
    • Improved TypeScript typing inside test mocks for safer compilation.
    • Added a shared snapshot constant and strengthened a null-check assertion.
    • Simplified test environment handling by explicitly setting HOME and adjusting shell invocation semantics.
    • Removed redundant inline comment assertions and tightened output checks.
    • Minor formatting and clarity tweaks across test suites for easier maintenance.

Signed-off-by: Krish Sapru [email protected]

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Type-only updates to Vitest vi.mock("node:fs") factories in blueprint tests (casting importOriginal() to typeof import("node:fs")); snapshot.test.ts adds a SNAP constant and an explicit null-check; test/uninstall.test.js adjusts fake npm env, shell flags, and spawned-process env usage.

Changes

Cohort / File(s) Summary
Blueprint tests (type cast)
nemoclaw/src/blueprint/runner.test.ts, nemoclaw/src/blueprint/state.test.ts, nemoclaw/src/blueprint/snapshot.test.ts
Vitest vi.mock("node:fs") async factories now cast importOriginal() to typeof import("node:fs") before spreading — compile-time TypeScript typing change only (no runtime behavior change).
Snapshot test constants & assertion
nemoclaw/src/blueprint/snapshot.test.ts
Introduce SNAP constant ("/snap/20260323"), replace repeated literal paths with SNAP/template variants, and add expect(result).not.toBeNull() before result! usage.
Uninstall tests (env & shell)
test/uninstall.test.js
createFakeNpmEnv(tmp) now forces HOME: tmp; several spawnSync calls switch ["-lc", "..."]["-c", "..."]; some spawned calls set env: { ...process.env, HOME: tmp }; minor formatting and comment edits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through mocks and tests today,
Cast imports so types won't stray,
I gathered snapshots in a single name,
Trimmed shell flags and set HOME the same,
I nibble code and scamper away. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes to nemoclaw/src/blueprint test files appear out of scope relative to issue #977's objective, which is to decide on remediation for dead modules—not to refactor their tests. Either remove TypeScript type-casting refactoring in blueprint test files (runner.test.ts, snapshot.test.ts, state.test.ts) and focus solely on uninstall/CLI tests, or clarify how blueprint test changes support issue #977's decision objective.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR's primary changes (test improvements, TypeScript type assertions) do not directly address issue #977's core objective of deciding how to handle four unreachable dead code modules. Clarify whether this PR partially resolves issue #977 by improving test coverage/stability as groundwork for a future decision, or if it is disconnected from the dead code remediation objective.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: improve CLI test determinism and remove redundant test logic' accurately reflects the main changes: standardizing test suites and improving stability without modifying production code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nemoclaw/src/blueprint/runner.test.ts (1)

577-641: ⚠️ Potential issue | 🟠 Major

Please restore regression coverage for apply --plan rejection.

main no longer tests the unsupported --plan path, but runtime still rejects it in actionApply. This leaves CLI parse/dispatch behavior unguarded.

Proposed test addition
   describe("main (CLI)", () => {
@@
     it("parses apply with --profile and --endpoint-url", async () => {
       await main(["apply", "--profile", "default", "--endpoint-url", "https://override.test/v1"]);
       expect(mockedValidateEndpoint).toHaveBeenCalledWith("https://override.test/v1");
       expect(stdoutText()).toContain("PROGRESS:100:Apply complete");
     });
+
+    it("rejects apply when --plan is provided (not yet implemented)", async () => {
+      await expect(
+        main(["apply", "--profile", "default", "--plan", "/tmp/plan.json"]),
+      ).rejects.toThrow(/--plan is not yet implemented/);
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/blueprint/runner.test.ts` around lines 577 - 641, Add a test in
the existing "main (CLI)" suite that exercises the unsupported apply --plan
path: call main with arguments like ["apply","--plan","some-plan.json"] (after
the existing beforeEach setup) and assert it rejects with an error containing
"--plan" (or the exact rejection text emitted by actionApply); this restores
regression coverage for the main -> actionApply dispatch path and ensures CLI
parsing still rejects the --plan option at runtime.
test/cli.test.js (1)

16-27: ⚠️ Potential issue | 🔴 Critical

Critical bug: spawnSync is misconfigured — tests will fail with TypeError: r.out.includes is not a function.

The current implementation has multiple issues:

  1. spawnSync does not throw exceptions — unlike execSync, it always returns a result object with an error property. The try-catch block will never catch non-zero exits; the returned out is always the full result object {error, status, stdout, stderr, ...}, which has no .includes() method.

  2. Tests call .includes() on an object — every test assertion like r.out.includes("Getting Started") will fail at runtime with TypeError: r.out.includes is not a function.

  3. Missing shell: true — without it, spawnSync treats the string as a literal executable name (looking for a file named node "${CLI}" ${args}), resulting in ENOENT instead of executing the shell command.

To fix, use:

Corrected implementation
 function runWithEnv(args, env = {}, timeout = 10000) {
-  try {
-    const out = spawnSync(`node "${CLI}" ${args}`, {
-      encoding: "utf-8",
-      timeout,
-      env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env },
-    });
-    return { code: 0, out };
-  } catch (err) {
-    return { code: err.status, out: (err.stdout || "") + (err.stderr || "") };
-  }
+  const result = spawnSync(`node "${CLI}" ${args}`, {
+    shell: true,
+    encoding: "utf-8",
+    timeout,
+    env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env },
+  });
+  const out = (result.stdout || "") + (result.stderr || "");
+  return { code: result.status ?? 1, out };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 16 - 27, The runWithEnv function misuses
spawnSync: it never throws, returns a result object (so tests calling
r.out.includes fail), and the command string needs shell: true; fix runWithEnv
by calling spawnSync with shell: true (or pass command and args as an array),
then read the returned result.stdout/stderr (convert to string) and
result.status/result.error to determine exit code; return { code: <numeric
status or error.status>, out: <stdout + stderr as string> } so callers can
safely call r.out.includes; update references in runWithEnv to use the result
object fields instead of assuming spawnSync throws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 577-641: Add a test in the existing "main (CLI)" suite that
exercises the unsupported apply --plan path: call main with arguments like
["apply","--plan","some-plan.json"] (after the existing beforeEach setup) and
assert it rejects with an error containing "--plan" (or the exact rejection text
emitted by actionApply); this restores regression coverage for the main ->
actionApply dispatch path and ensures CLI parsing still rejects the --plan
option at runtime.

In `@test/cli.test.js`:
- Around line 16-27: The runWithEnv function misuses spawnSync: it never throws,
returns a result object (so tests calling r.out.includes fail), and the command
string needs shell: true; fix runWithEnv by calling spawnSync with shell: true
(or pass command and args as an array), then read the returned
result.stdout/stderr (convert to string) and result.status/result.error to
determine exit code; return { code: <numeric status or error.status>, out:
<stdout + stderr as string> } so callers can safely call r.out.includes; update
references in runWithEnv to use the result object fields instead of assuming
spawnSync throws.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10c32d50-fb8f-495a-8091-6c998082c50e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a97e89 and e965901.

📒 Files selected for processing (5)
  • nemoclaw/src/blueprint/runner.test.ts
  • nemoclaw/src/blueprint/snapshot.test.ts
  • nemoclaw/src/blueprint/state.test.ts
  • test/cli.test.js
  • test/uninstall.test.js

@ksapru ksapru force-pushed the fix/dead-code-cleanup-v2 branch from e965901 to 93ab0bb Compare March 30, 2026 21:46
Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the determinism goal here makes sense, but I think this needs a bit more work before merge.

Two blockers from the current diff:

  1. test/cli.test.js: the execSync -> spawnSync swap is not equivalent as written. spawnSync(node "${CLI}" ${args}, ...) will try to execute a binary with that full string as the executable name unless shell: true is set, so this should hit ENOENT. Also, spawnSync returns { status, stdout, stderr, error } and does not throw on non-zero exit, so the helper now returns { code: 0, out: resultObj } on success instead of a string, and the existing error path no longer matches execSync semantics. I think this is the likely cause of the failing test-unit job. If we want spawnSync here, I would switch to spawnSync("node", [CLI, ...args], ...) and rebuild the helper around status/stdout/stderr/error.

  2. nemoclaw/src/blueprint/runner.test.ts: I don’t think the --plan test is redundant yet. runner.ts on current main still explicitly throws --plan is not yet implemented... in actionApply(), so removing this test drops coverage for behavior that still exists in production code.

Optional follow-up: the bash -lc -> bash -c direction in test/uninstall.test.js seems reasonable, but the file is not Prettier-clean right now, which may explain the red lint job. Also, for the HOME cases, setting HOME via env is safer than embedding HOME="..." source ... inside the command string.

Happy to re-review once those are addressed.

@ksapru ksapru force-pushed the fix/dead-code-cleanup-v2 branch from 07e6b03 to 8324977 Compare March 30, 2026 21:54
@ksapru
Copy link
Copy Markdown
Contributor Author

ksapru commented Mar 31, 2026

I’ve reverted the execSync → spawnSync change in test/cli.test.js. The previous swap wasn’t equivalent (as you pointed out: ENOENT risk + different return/error semantics), and keeping execSync preserves the current behavior and test expectations.

Also re-added the --plan test to retain coverage for the existing behavior in runner.ts.

For test/uninstall.test.js, I switched to bash -c and moved HOME into env for more deterministic behavior. I’ll make sure the file is Prettier-clean as well.

Happy to revisit a proper spawnSync refactor separately if that’s something we want to pursue.

@ksapru ksapru requested a review from cv March 31, 2026 13:07
@wscurran wscurran added enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. fix labels Mar 31, 2026
@ksapru ksapru force-pushed the fix/dead-code-cleanup-v2 branch from 90d9218 to 1cc0663 Compare March 31, 2026 18:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/uninstall.test.js (1)

98-122: Clean up temp directories consistently in helper tests.

These three tests create temp dirs but don’t remove them, unlike the --yes test. Please wrap each in try/finally (or extract a small temp-dir helper) to avoid /tmp buildup in repeated local/CI runs.

♻️ Suggested pattern
 it("removes the user-local nemoclaw shim", () => {
   const tmp = fs.mkdtempSync(
     path.join(os.tmpdir(), "nemoclaw-uninstall-shim-"),
   );
-  const shimDir = path.join(tmp, ".local", "bin");
-  const shimPath = path.join(shimDir, "nemoclaw");
-  const targetPath = path.join(tmp, "prefix", "bin", "nemoclaw");
-
-  fs.mkdirSync(shimDir, { recursive: true });
-  fs.mkdirSync(path.dirname(targetPath), { recursive: true });
-  fs.writeFileSync(targetPath, "#!/usr/bin/env bash\n", { mode: 0o755 });
-  fs.symlinkSync(targetPath, shimPath);
-
-  const result = spawnSync(
-    "bash",
-    ["-c", `source "${UNINSTALL_SCRIPT}"; remove_nemoclaw_cli`],
-    {
-      cwd: path.join(import.meta.dirname, ".."),
-      encoding: "utf-8",
-      env: createFakeNpmEnv(tmp),
-    },
-  );
-
-  expect(result.status).toBe(0);
-  expect(fs.existsSync(shimPath)).toBe(false);
+  try {
+    const shimDir = path.join(tmp, ".local", "bin");
+    const shimPath = path.join(shimDir, "nemoclaw");
+    const targetPath = path.join(tmp, "prefix", "bin", "nemoclaw");
+
+    fs.mkdirSync(shimDir, { recursive: true });
+    fs.mkdirSync(path.dirname(targetPath), { recursive: true });
+    fs.writeFileSync(targetPath, "#!/usr/bin/env bash\n", { mode: 0o755 });
+    fs.symlinkSync(targetPath, shimPath);
+
+    const result = spawnSync(
+      "bash",
+      ["-c", `source "${UNINSTALL_SCRIPT}"; remove_nemoclaw_cli`],
+      {
+        cwd: path.join(import.meta.dirname, ".."),
+        encoding: "utf-8",
+        env: createFakeNpmEnv(tmp),
+      },
+    );
+
+    expect(result.status).toBe(0);
+    expect(fs.existsSync(shimPath)).toBe(false);
+  } finally {
+    fs.rmSync(tmp, { recursive: true, force: true });
+  }
 });

Also applies to: 125-149, 152-174

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/uninstall.test.js` around lines 98 - 122, The tests creating temporary
directories (variables tmp, shimDir, shimPath, targetPath) and invoking
spawnSync to run remove_nemoclaw_cli should ensure the temp dir is removed after
each test; wrap the setup + test execution in a try/finally (or use a small
temp-dir helper) and call fs.rmSync(tmp, { recursive: true, force: true }) in
the finally block so the temp directory is always cleaned even on failures;
apply the same pattern to the other similar tests that create temp dirs (the
blocks around the other spawnSync calls using createFakeNpmEnv and
UNINSTALL_SCRIPT).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/uninstall.test.js`:
- Around line 98-122: The tests creating temporary directories (variables tmp,
shimDir, shimPath, targetPath) and invoking spawnSync to run remove_nemoclaw_cli
should ensure the temp dir is removed after each test; wrap the setup + test
execution in a try/finally (or use a small temp-dir helper) and call
fs.rmSync(tmp, { recursive: true, force: true }) in the finally block so the
temp directory is always cleaned even on failures; apply the same pattern to the
other similar tests that create temp dirs (the blocks around the other spawnSync
calls using createFakeNpmEnv and UNINSTALL_SCRIPT).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4f69b1b4-a9aa-4c7d-8f0b-1980f8407c4e

📥 Commits

Reviewing files that changed from the base of the PR and between 8fa35a6 and 2c48ebc.

📒 Files selected for processing (2)
  • nemoclaw/src/blueprint/snapshot.test.ts
  • test/uninstall.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemoclaw/src/blueprint/snapshot.test.ts

@ksapru
Copy link
Copy Markdown
Contributor Author

ksapru commented Mar 31, 2026

Workflows should be good to run, if all looks good. I've signed the commit using a GPG key, resolved linting issues and added a signature at the bottom of the PR to ensure ownership. Let me know if you have any questions/concerns.

@jyaunches jyaunches self-requested a review March 31, 2026 20:22
@jyaunches
Copy link
Copy Markdown
Contributor

@cv This looks good to merge.

@cv
Copy link
Copy Markdown
Contributor

cv commented Mar 31, 2026

@ksapru I found out we were not formatting a huge chunk of the code (#1200), so a lot of PRs will need to be reformatted. Sorry!

@ksapru ksapru force-pushed the fix/dead-code-cleanup-v2 branch 2 times, most recently from 10615a5 to 753e559 Compare April 1, 2026 01:50
@cv cv merged commit 71b4141 into NVIDIA:main Apr 1, 2026
1 check passed
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
…1123)

## Summary
Improves test determinism, consistency, and reliability across CLI,
uninstall, and blueprint test suites by standardizing shell invocation,
tightening execution patterns, and removing redundant or outdated test
code.

---

## Related Issue
Fixes #977 (part 1)

---

## Changes
- Normalize shell invocation:
- Replace `bash -lc` with `bash -c` in uninstall tests to avoid shell
initialization side effects
- Improve CLI test stability:
  - Increase timeouts for long-running commands
  - Standardize usage of `runWithEnv(..., timeout)`
- Remove redundant / outdated test code:
  - Clean up unused or deprecated test logic in `runner.test.ts`
- Improve test consistency:
  - Align execution patterns across CLI and uninstall tests
- Preserve security coverage:
- Maintain regression protections (e.g., path validation and credential
handling)

---

## Verification
- `npm test` passes locally
- `npx prek run --all-files` passes in CI
- No changes to CLI behavior or runtime logic
- Existing security and regression tests continue to pass

---

## Rationale
Some tests relied on shell initialization behavior (`bash -lc`) and
inconsistent execution patterns, leading to flakiness and
non-deterministic outcomes.

These updates:
- eliminate shell-dependent variability  
- standardize execution across test suites  
- improve reliability without impacting functionality  

Additionally, minor cleanup removes redundant or outdated test code to
improve maintainability.

---

## Risk Assessment
**Low risk**

- Changes are limited to test code and execution behavior  
- No production code paths modified  
- Security and regression coverage preserved  

**Rollback**
- Fully reversible by reverting test changes

---

## Type of Change
- [x] Test / infrastructure improvement (no behavioral change)
- [x] Code cleanup / maintenance

---

## Testing
- [x] `npm test` passes  
- [x] `npx prek run --all-files` passes (CI)  

---

## Checklist

### General
- [x] Contributing guide followed  

### Code Changes
- [x] Formatters applied  
- [x] No user-facing behavior changes  
- [x] No secrets committed  

---

## Summary by CodeRabbit (updated)
* **Tests**
* Improved CLI and uninstall test determinism by standardizing shell
invocation
  * Increased timeouts to reduce flakiness in long-running test cases
* Removed redundant or outdated test logic for improved maintainability


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
  * Improved TypeScript typing inside test mocks for safer compilation.
* Added a shared snapshot constant and strengthened a null-check
assertion.
* Simplified test environment handling by explicitly setting HOME and
adjusting shell invocation semantics.
* Removed redundant inline comment assertions and tightened output
checks.
* Minor formatting and clarity tweaks across test suites for easier
maintenance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Krish Sapru <[email protected]>

---------

Co-authored-by: Carlos Villela <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: testing Use this label to identify requests to improve NemoClaw test coverage. fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dead plugin modules: 4 TypeScript files ship in dist/ but are unreachable at runtime

4 participants